Large-scale genealogical information extraction from handwritten Quebec parish records

نویسندگان

چکیده

This paper presents a complete workflow designed for extracting information from Quebec handwritten parish registers. The acts in these documents contain individual and family highly valuable genetic, demographic social studies of the population. From an image records, our is able to identify extract personal information. divided into successive steps: page classification, text line detection, recognition, named entity recognition act detection classification. For all steps, different machine learning models are compared. Once extracted, validation rules by experts then applied standardize extracted ensure its consistency with type (birth, marriage, death). step reject records that considered invalid or merged. full has been used process over two million pages registers 19-20th centuries. On sample comprising 65% registers, 3.2 were recognized. Verification birth death this shows 74% them valid. These will be integrated BALSAC database linked together recreate genealogical relations at large scale.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Information Extraction from Echocardiography Records

Electronic health records are a rich source for medical information. However, large parts of clinical diagnosis reports are in textual form and are therefore not per se usable for statistical evaluations. To transform the information from an unstructured into a structured form is the goal of medical language processing. In this paper we want to propose an approach for the creation of a training...

متن کامل

Information Extraction from Historical Semi-Structured Handwritten Documents

In this paper, we describe our approach to extract salient events such as birth and death records from historical French parish documents that contain free-form handwritten text. The challenges posed by these documents to the current state of the art in handwriting recognition and information extraction go well beyond the generic challenges in recognizing handwritten text such as style variatio...

متن کامل

How to improve information extraction from German medical records

Vast amounts of medical information are still recorded as unstructured text. The knowledge contained in this textual data has a great potential to improve clinical routine care, to support clinical research, and to advance personalization of medicine. To access this knowledge, the underlying data has to be semantically integrated an essential prerequisite to which is information extraction from...

متن کامل

Data-Driven Information Extraction from Chinese Electronic Medical Records

OBJECTIVE This study aims to propose a data-driven framework that takes unstructured free text narratives in Chinese Electronic Medical Records (EMRs) as input and converts them into structured time-event-description triples, where the description is either an elaboration or an outcome of the medical event. MATERIALS AND METHODS Our framework uses a hybrid approach. It consists of constructin...

متن کامل

Context Related Extraction of Conceptual Information from Electronic Health Records

This paper discusses some language technologies applied for the automatic processing of Electronic Health Records in Bulgarian, in order to extract multi-layer conceptual chunks from medical texts. We consider an Information Extraction view to text processing, where semantic information is extracted using predefined templates. At the first step the templates are filled in with information about...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal on Document Analysis and Recognition

سال: 2023

ISSN: ['1433-2833', '1433-2825']

DOI: https://doi.org/10.1007/s10032-023-00427-w